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ENERGETICALLY FAVORABLE BINDING SITE DETERMINATION 
BETWEEN TWO MOLECULES 

Field of the Invention 

5 The present invention relates to a method for generating a position value for an 

energetically favorable binding site between two molecules. More particularly, the 
invention relates to a computational tool for the determination of energetically favorable 
binding sites in macromolecules and further, to a high resolution fast quantitative 
docking determination tool using Fourier domain correlation techniques. 

10 

Background of the Invention 

The development of computation tools for the determination of energetically 
favorable binding sites in macromolecules, referred to as "docking", has been of 
considerable interest during the past decade. These tools have been introduced to assist 

15 structure-based drug design. The approaches fall into two main categories, namely, 
qualitative and quantitative methods. Qualitative methods are restricted primarily to 
calculations based on shape, complementarity and consist of finding the best fit between 
two shapes. A computer program called "Dock" exemplifies this approach and has been 
described in the paper authored by B.K. Shoichet and I D. Kuntz entitled "Matching 

20 chemistry and shape in molecular docking", which appeared in Protein Engineering, 
7:723-732, 1993. Quantitative methods are based primarily on energy calculations 
seeking to determine the global minimum energy of the ligand binding interaction with 
the protein target. This approach is described in the paper by P.A. Kollman entitled 
"Free energy calculations: applications to chemical and biochemical phenomena", 

25 published in Chem. Rev. 93:2395-2417, 1993. Hybrid methods also exist in the form of 
calculating interaction energy between a given protein and small molecular fragments 
which are then assembled based on shape, complementarity criteria to form new 
molecules. This approach is described in the paper by P.A. Goodford entitled 
"Computational procedure for determining energetically favorable binding sites on 

30 biologically important molecules", published in J. Med. Chem, 28:849-857, 1985. More 
recently, a successful prediction was achieved in describing the binding of P-lactamase 
inhibitor protein with TEM-1 p-lactamase by employing both quantitative and 
qualitative methods as well as a combination thereof This approach is documented in 
the paper by N. Strynadka et al entitled "Molecular docking program successfully 

35 predict the binding of P-lactamase inhibitory protein to TEM-1 P-lactamase", published 
in Nature Struct. Biol, 3(3):233-239, 1996. 

When developing a computational tool for the determination of favorable binding 
sites in molecules, according to the quantitative methods, it is possible to simulate 
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intermolecular movement by computing intermolecular forces to determine a preferred 
"docking" site between the molecules. It is also possible to calculate the energy of 
interaction between the two molecules to determine as the best binding site those sites 
which have the most favorable or minimum potential energy. The predictive accuracy of 
5 any quantitative method is limited by the resolution or precision of the model. In most 
calculations, a grid is used onto which the molecule structures are mapped. This 
mapping takes place with or without some kind of transfer function, e.g. a 1/r-function 
in the case of electrostatic potential description. The calculation of the interaction 
between the two molecules, such as calculating the potential energy between the two 
10 molecules, must be done for each relative position of the two molecules, namely, each 
relative translational position and each rotational orientation between the two molecules. 
Since the amount of computation is extremely high, the prior art approach has been to 
select a moderate grid resolution before performing a calculation of favorable binding 
sites between molecules. 



15 



20 



25 



Summary of the Invention 

It is an object of the present invention to make use of correlation between a 
potential grid representing one molecule and an interaction field grid representing 
another molecule to obtain for each selected relative rotation between the two molecules 
a potential energy representing a binding energy of the two molecules for relative 
translational positions in space between the two molecules. Using a single complex 
correlation calculation for each relative rotation between the two molecules, the 
resulting grids can be scanned to obtain the most energetically favorable binding site 
between two molecules. By using a grid resolution in the range of 0.25 A-0.45 A, it has 
been found that at such high resolution the correlation technique by convolution of two 
grids representing the two molecules provides very acceptable quantitative results for 
molecule binding energy for all relative translational positions in space between the two 
molecules. 

According to the invention, there is provided a method for generating a position 
30 value for an energetically favorable binding site between two molecules, comprising the 
steps of a) obtaining potential energy structural data for each atom site in the 
molecules; b) selecting a grid resolution corresponding to a sampling grid size 
substantially smaller than an average distance between bonded atoms in the molecules; 
c) selecting a range of relative rotations between the two molecules; d) mapping a 
35 plurality of potential energy field components of one of the molecules onto a 
corresponding one of a plurality of energy field component grids having the resolution 
with one molecule at a predetermined rotation and position, wherein each grid point of 
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the component grids has a potential energy value interpolated from the potential energy 
structural data; e) mapping a plurality of interaction field components of another of the 
molecules onto a corresponding one of a plurality of interaction component grids having 
the resolution with the other molecule at a predetermined rotation and position, the 
5 interaction component corresponding to coefficients of a forcefield between the 
molecules, wherein each grid point of the component grids has an interaction value 
interpolated from the potential energy structural data; f) calculating a correlation 
between each potential energy field component grid and each interaction field 
component grid to obtain a grid of molecule binding energy values representing- a 

10 binding energy of the two molecules in the relative rotation for relative translational 
positions in space between the molecules; g) determining at least one maximum of the 
binding energy values and recording the relative translational positions for the maximum 
binding energy values; h) rotating at least one of the molecules according to each 
relative rotation in the range, repeating the step of mapping for the at least one of the 

15 molecules and subsequently repeating the steps (f) and (g) of calculating and 
determining for each relative rotation; and i) selecting an energetically favorable one of 
the relative rotations in the range and the relative translational positions based on the 
maximum binding energy values to generate the position value for an energetically 
favorable binding site between the two molecules. 

20 As can be appreciated, since only one molecule needs to be rotated relative to the 

other, the map of one of the molecules can be used repeatedly while the map of the 
second molecule can be recalculated for each new rotational position. Since the 
interaction field components are easier to map, it is preferred that only the interaction 
component grids be remapped for each new rotation. Also preferably, the preferred 

25 transform for carrying out the correlation is the discrete Fourier transform. 

Preferably, the potential energy field components consist of the electrostatic 
potential which is based on Coulomb's law and varies as a function of 1/r, a second 
component for the first Van der Waals term A which varies as a function of 1/r 12 and a 
third component grid for the second Van der Waals term B which varies as a function of 

30 1/r 6 . As can be appreciated, selection of the Van der Waals terms is a matter of 
selection of a preferred model and other models other than the Lennard Jones 6-12 
potential energy description may be used. The result of the correlation for each field 
component must be summed with the results of the other components in order to obtain 
a total binding energy of the two molecules for the given relative rotation and for each 

35 relative translational position in space provided within the grid. 

It is preferred according to the invention to record a plurality of most energetically 
favorable maximum binding energy values. In order to select the most energetically 
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favorable binding site between two molecules, it is preferred to apply Boltzmann's 
statistics in order to determine the probabilities of occupying the various most 
energetically favorable binding energy states in order to determine which state is most 
probable and generate as the position value the position value corresponding to the 
5 relative position between the two molecules corresponding to the most probable binding 
energy value. 

Also preferably, the method may be used at a plurality of resolutions in order to 
accommodate ease of computation. For example, a first stage of generating the best 
position value or best position values at a grid resolution of 0.4 A may be confirmed by 

10 proceeding with a 0.25 A grid resolution in which the range of relative rotations 
between the two molecules are centered closely around each of the relative rotations 
determined at the initial lower resolution. Since higher resolution requires greater 
memory and manipulation of data during calculation, the grid dimensions at the higher 
resolution may be selected knowing in advance what grid dimensions will accommodate 

1 5 the anticipated result while working with smaller dimensions of grids. 

The invention will be better understood by way of the following detailed 
description of a preferred embodiment of the invention in which: 

Figure 1 shows an exemplary energy vs. radius diagram for the total potential 
energy of an atom in a molecule in proximity with another atom of another molecule in 

20 which the electrostatic potential, repulsive 1/r 12 Van der Waals potential and the 
attractive 1/r 6 potential are also shown separately and the maximum negative binding 
energy trough can be seen, according to the prior art; 

Figure 2 is a two-dimensional projection of dihydroxy acetonephosphate 
superimposed on a grid having a resolution of 0.4 A; 

25 Figure 3 is a flow diagram illustrating the method steps according to the preferred 

embodiment; and 

Figure 4 is a block diagram of the position generator system according to the 
preferred embodiment. 

30 Detailed Description of the Preferred Embodiment 

In the preferred embodiment, the molecules whose most energetically favorable 
binding site position values to be generated are macromolecules, namely, a protein and a 
ligand. The protein typically has a very large number of atoms in its structure in the 
order of 5000 or greater. The ligand is typically between 1-25% of the size of the 
35 protein. 

The potential energy of the system consisting of the protein and ligand is 
calculated by determining the potential energy field created by the protein and then 
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calculating the potential energy resulting from the contribution of each atom in the 
ligand for a particular position in space within the potential energy field of the protein. 
The potential energy is calculated using three basic terms. The first term is the 
electrostatic potential. This results from an electrostatic charge at a particular atom 
5 within the ligand interacting with the electrostatic field potential created by the 
molecule. Such potentials are greater in polar or ionic molecules. The second and third 
potential energy terms come from the Van der Waals potentials. In the preferred 
embodiment, the 6-12 Lennard Jones potential is used. As illustrated in Figure 1, the 
combination of the three potential energy terms may lead to a potential energy minimum 

10 (maximum binding energy) as a particular radial distance. Potential terms can be 
extended by an explicit term for hydrogen bond interaction. 

For the chosen protein and the chosen ligand, data concerning static charge at the 
atom sites in the molecules as well as the coefficients for the Van der Waals forces are 
obtained from existing databases. Such potential energy structural data is originally 

15 determined empirically and/or by theoretical model calculations. Next, a grid resolution 
corresponding to a sampling grid size substantially smaller than an average distance 
between atoms in the molecules is selected. It has been found that a sampling grid size 
of 0.4 A provides, in most cases, sufficiently high resolution to obtain good results for 
protein ligand pairs. A grid resolution of 0.25 A, while computationally more intensive, 

20 has been found to provide substantially more accurate results. While it would be 
possible to select a grid resolution which is higher for the 1/r 12 component than for the 
1/r 6 component, and even then a lesser resolution for the 1/r component of the potential 
energy field components, the individual component energy correlation calculation would 
be faster for the lower resolution grids, however, there would result the complication 

25 that addition for each point in space of the total binding energy would require 
calculating an interpolated value from the lower resolution grids to obtain binding 
energy values with respect to each point in the higher resolution grid. 

Once the grid resolution is selected, each potential energy field component of one 
of the molecules, in the preferred embodiment the protein, is mapped onto a 

30 corresponding energy field component grid. This typically involves calculating for each 
grid point the potential energy field created by each atom site in the protein and 
summing all potentials to obtain the field potential. Since this step of mapping may only 
be carried out once, the effect of every atom site in the protein may be taken into 
account and all of the computation time required may be taken. For atoms very close to 

35 a grid point, where computational errors can result from selection outside the 
representation range of numbers in a computer, an arbitrary high value for their 
contribution to the potential field is taken. The relative spatial coordinates of each atom 
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site for the protein and for the ligand are known from the structural data obtained from 
existing databases, or from predicted structural data. 

The ligand being generally, but not necessarily, a much smaller molecule is easier 
4r~ nrrAA t« tVio r\rt*ferr*>i\ pmhnHimpnt for the liaand. the notential enerev 

Wj inayj uinu Ltiv giiu. m u»w ^»v»v*»vw ^ WM ,.w. -, — 0 7 a 

5 field components are not mapped onto the grid but rather the interaction field 
components are mapped onto the grid. The interaction field components relate to the 
charge quantities in the case of the electrostatic potential and the Van der Waals 
coefficients in the case of the Van der Waals potentials. For each atom site, the 
coefficients associated therewith are mapped onto the grid points surrounding each atom 

10 site in virtual space. The interpolation method for such mapping may be trilinear or a 
Gaussian distribution. In the case of a Gaussian distribution, each atom site has 
coefficients which are mapped onto a block of 5 x 5 x 5 grid elements. For each grid 
element, the contribution of each atom is calculated using the Gaussian distribution 
function. In particular, for the 1/r 6 and 1/r 12 components, best results are obtained 

15 using a Gaussian interpolation. As can be appreciated, the calculation of the values for 
the interaction field grid relating to the ligand involves carrying out a series of simple 
calculations with respect to each atom site in the ligand. The interaction component 
grids are built up for the particular rotational orientation of the ligand within the grid 
space by calculating the interaction field components for all of the atom sites in the 

20 ligand. 

Since the potential energy field grids and the corresponding interaction field 
component grids have the same grid resolution and grid size, a correlation between the 
two grids may be calculated. In the preferred embodiment, the discrete Fourier 
transform using a fast Fourier transform method is applied to each grid. The two 

25 transformed grids are then multiplied using element by element multiplication to obtain 
an intermediate product grid, and then the intermediate product grid is subjected to an 
inverse fast Fourier transform to obtain a grid representing for each point in the grid a 
binding energy for each component for each translational position in space between the 
protein and the ligand. By summing the resulting component grids for the binding 

30 energies, a single total binding energy grid is obtained. The total binding energy grid is 
scanned to determine a maximum binding energy value for the particular rotation of the 
ligand. As can also be appreciated, if an atom site happens to fall directly on a grid point 
as a result of the virtual rotation, the computational accuracy is not compromised. For 
this reason, it is further preferred to rotate the molecule whose interaction field 

35 components are being calculated and mapped onto the grid rather than rotating the 
molecule whose potential energy field components are being mapped. The method 
described thus far is carried out for every conceivable relative rotation between the 
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protein and the ligand. Normally, this would be all orientations unless some prior 
knowledge of the molecule system is known. In a parallel computer system, the 
mapping of the interaction field components of the ligand to produce the grids, the 
calculation of the forward and inverse Fourier transforms to obtain the correlation result 
5 and adding of the component grids to obtain maximum binding energy values could all 
be done as independent tasks. The most memory and computation intensive component 
is, of course, the correlation calculation which could be carried out using a vector 
processor. 

As illustrated in Figure 3, it is possible to carry out the above method initially at a 

10 course resolution, for example, at a sampling grid size of 0.4 A only to select one or 
more most energetically favorable relative rotations and then repeat the method selecting 
a small solid angle around each most energetically favorable relative rotation, i.e. the 
rotation of the ligand, to determine at a resolution of, for example, 0.25 A, what the 
exact maximum binding energy value is at each such rotational position in order to 

15 determine, based on the most energetically favorable binding energy value, the position 
value at the higher grid resolution. Also, when attempting to distinguish between a 
plurality of most energetically favorable maximum binding energy values, Boltzmann 
probabilities of occupying energy states corresponding to the most energetically 
favorable binding energy values can be calculated in order to select the most probable 

20 favorable binding energy value and generate as the position value the position value of 
the ligand corresponding to this maximum binding energy value. In the case that a few 
energetically favorable maximum binding energy values are the result of the above 
method and there is some uncertainty as to which binding energy value represents the 
likely binding site between the protein and the ligand, the method may be repeated 

25 following remapping of the interaction potential field components with the ligand being 
randomly shifted and rotated in space. The same rotations of the ligand are applied as 
before in order to obtain another set of maximum binding energy values having different 
computational error components. This procedure is repeated until agreement for the 
maximum binding energy value is obtained independently of the ligand starting position, 

30 greater confidence may be had in the resulting position value. 

As can be appreciated, the above method presumes that the molecules are rigid 
and are not subjected to outside forces. The change in ligand shape in the dynamical 
interaction between two molecules can be considered as subsequent coordinate sets of 
the ligand and the method can be carried out for each different coordinate set of the 

35 ligand to determine the position value for the energetically favorable binding site for 
each coordinate set of the ligand, and as a result, determine what is the most likely 
dynamical docking configuration of the protein ligand system. 
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Treatment of the dynamical motion of the protein can be interpreted as subsequent 
evaluations of the binding energies each using a different conformation of the protein. It 
can be appreciated that such an approach, though valid, is inefficient from the point of 

n onlv a small nart of the protein will adjust 

view or conipuiauGn wiw, oww, — — j , — j »- 

5 to a different conformation on the incoming Ugand. The potential energy components 
are then preferably mapped in two parts. First the potential energy field grid is mapped 
for the larger part of the protein which does not change conformation, and this first grid 
is stored and re-used each time. To calculate the total potential energy field grid for 
each conformation of the protein, the potential energy grid for the second part of the 

10 protein, which has assumed a different conformation, is calculated. The potential energy 
field grid of the first part is added to the potential energy grid of the second part to 
obtain the total potential energy field grid for the protein in the conformational state. 
This method of mapping the potential energy component grids is preferred because the 
computational time required to map the potential energy components onto the 

15 component grids is significant for larger molecules. 

In the preferred embodiment, a limited forcefield restricted to electrostatic and 
Van der Waals interactions is used to describe the interaction between protein and 
ligand. The interaction enthalpy at time / is described by: 

20 E h {t) = E.(t)^E a {t) + E r {t) e* 1 

in which Eh is the enthalpy, E e the electrostatic component and E a and E r the attractive 
and repulsive Van der Waals components, respectively. Molecular dynamics can in 
principle find the configuration of a system for which the enthalpy is minimum. 
25 Boltzmann statistics, however, prohibit the answer to be found in a reasonable amount 
of time due to energy barriers unless there is some a priori knowledge of where the 
ligand might dock. Several developments have aimed at circumventing this problem by 
introduction of simplifying approximations. Most often, time dependence is replaced by 
a molecular mechanics approach of calculating the interaction enthalpy for different 

30 positions of the Ugand relative to the protein. A priori knowledge of where the Ugand 
might dock can reduce the number of calculations that have to be performed. Such a 
priori knowledge is however not available in most cases and a considerable 
computational effort is required to assess potential docking sites. In order to reduce the 
computational effort, it is proposed to make use of a method based on the Convolution 

35 Theorem. The reduction in computational effort makes possible calculations based on a 
dynamical description of protein-Ugand interaction, and allows for calculations to be 
performed at sampling grid sizes of high resolution. 
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To first order approximation, dynamical motions of ligand and protein will be 
assumed to be uncorrelated. Eq. 1 can therefore be approximated by: 

E h (jJ p J^^EX^f p J s )^E a (fJ p J^^EXrJ P Js) eq.2 

5 

in which r denotes the distance vector between protein and ligand, and fp is the set of 
coordinates, or frame, of the protein at time t p andfs the coordinate frame of the ligand 
at time t 5 , tp and t^ being uncorrelated. Since both protein and ligand are sampled 
independently over time, eq. 2 becomes separable in fp andfs and computations can be 

10 performed over one fp at a time. For simplicity it is assumed that there is only one 
set of coordinates fp for the protein structure. 

In adiabatic dynamical simulations, i.e. when no external force is acting on a 
system of particles, the global velocity and global angular momentum of a system are 
constant and are reset to zero at the start of a simulation. Therefore, it is only possible to 

15 sample the relative atomic motions for 3N degrees of freedom instead of the usual 
3N+6, where N is the number of atoms. Since rigid body translational sampling is 
implicitly incorporated in f , eq. 2 must consequently be written explicitly in terms of 
rotational sampling utilizing Eulerian angles 0\ y 02, &i- 



20 



25 



E&O^Kfs) 

Eq. 3 can be further expanded in terms of atomic coordinates / and j of protein and 
ligand, respectively, as: 

eq. 4 

JJ- — 



in which q is the atom partial charge, B and A the attractive and repulsive components 
of a (6-12) Lennard- Jones potential energy description, r espectiv ely. N p and N s den ote 
the number of atoms in protein and ligand. Setting A i} = ^jA H A ji . and B u = ^B^B^. , all 

three terms in eq. 4 become separable: 



30 
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;=0 i=0 *0 / V\ C7 l' C7 2» l7 3»A/ 



eq. 5 



Each of the three energy terms in eq. 5 can be interpreted as a series of discrete 
correlations in r, , with r, e *R 3 , and s an instantiation of 0], 02, &3js- 

5 

To evaluate the interaction energy described in eq. 5, the protein structure must 
first be mapped onto a grid G. Each interaction energy term on the right hand side of eq. 
5 is equivalent to a sum of products of two functions each of which can be evaluated 
separately: one function corresponding to calculation of the protein potential energy 
10 field on grid G while the second represents the ligand interaction field. For each grid 
point g , the protein electrostatic potential energy v' p described by the first term on the 

right hand side of eq. 5 becomes: 
to^olg-ll 



with Tan interpolation function and r, the protein atom position of atom /. Mapping of 
the ligand component, which depends on s, onto grid G corresponds in eq.5 to using a 
delta interpolation function such that for each grid point gO) the ligand interaction field 

20 S' s becomes: 

with r j the atomic position of ligand atomy. Both second and third terms of eq. 5 
25 representing the Van der Waals potential energy contribution can be similarly 
elaborated. Continuing, the estimated electrostatic contribution to the interaction 
enthalpy can then be written as: 



G 
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with® the correlation operator. Making use of the Convolution Theorem the identity on 
the right hand side of eq. 8 is equivalent to: 

5 *:®(s))® v;(g) = 3-'(A*,(h<>)) V?(h)) eq. 9 

with 3 denoting the discrete Fourier transformation (DFT) and A and V the Fourier 
transformation of 5 and v, respectively, and h a vector in spatial frequency domain; * 
indicates the complex conjugate. Eq. 9 is critical in the current approach in that" it 
10 allows for a significant reduction of the computational effort from OfNxN) to 
0(N\og2N) with N representing the number of grid points. Inclusion of Van der Waals 
dispersion energy terms is straightforward and consists of the addition to eq. 9 of two 
additional discrete Fourier transformations analogous to the electrostatic potential 
energy term. 

15 The DFT algorithm inherently assumes the function being transformed to be a 

periodic function. To eliminate sampling end-effects due to overlap or cross-talk, the 
total number of grid points used to sample the ligand-protein forcefield must equal at 
least the sum of the grid dimensions used to represent the individual protein and ligand 
forcefield functions. Additionally, one should allow for a certain buffer as ligand size 

20 varies infs. In the present approach a representation is chosen that considers protein 
and ligand potential functions to be finite and non-periodic. Using the Convolution 
Theorem, interaction energy evaluation involves therefore Fourier decomposition to 
spatial frequency domain. 

In principle the method can be implemented as a single 7-dimensional DFT (i.e. 

25 f s y 0], 02* 03>ts) or even higher dimensional space (if temperature, pH, etc. are included). 

Although this will reduce computational effort compared to a full 7-dimensional 
sequential row-column DFT approach, the approach will require large amounts of 
computer memory and will be unnecessarily computationally intensive. Since variables 0 
1*02*03 and ts refer to single, uncorrected, instances of ligand conformation, 

30 computational effort can be reduced if each ligand conformation is treated separately. 
Efficiency of this approach can be illustrated as follows by considering for instance 
ligand conformational sampling as function of time in a dynamical simulation: if M 
samples of the ligand structure are recorded in time, then the total computational load 
for the evaluation of interaction energies using the Convolution Theorem will be of the 

35 order of M*0fNlog2H), with N the total number of grid elements used in a 3- 
dimensional grid representation. Conversely, treating the same problem as a full 4- 
dimensional representation would involve a computational load of the order of 
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OfM*Nlog2 (M*m significantly larger than the former. Even with more elaborate 
Cementations of higher dimensional FFT algorithms this difference is not dimimshed 
substantially. The present implementation uses therefore a sequent approach , « 6 ,62, 
T, .... ;^,ion i* evaluated against the protein potential grid; thus terms 

63 js: --'"^"" t . ntl ^ (C/ V in eq. 6 for the electrostatic potential, 

describing protein interaction potentials \c.j. v p m c H . 

and equivalent attractive and repulsive Van der Waals potentials), need only be 
TaLL once and can be kept in the spatial frequency domain Addmoo of terms ^ take 
place in spatia. frequency dotnaitt. After back-transforming (eq. 8) a Us, of lowest 

^sls^tLc positions are general* no, concern ,0 grid Cements an 
interpolation mechanism has bean introduced as indicated in eq 6 e, se« There juea 
„»„Lr of schemes including Gaussian-, ,ri-linear- or sphne m,crpola 0 on. C— 
tat erpola,ion scheme was chosen because i, gives a be«er approxtmatton ,han m-hn^ar 
insolation. A spline interpolation was considered ,0 be ,oo elaborate for on 
purposes. Eq. 6 was consequently evamated in two s,eps. If ,he numerator of eq. 6 ,a 
defined as: 

eq. 10 

a in which T now expUcidy denotes a Gaussian interpolation function and Qp the protein 
charge distribution function, eq. 6 can be evaluated as: 

25 in which - denotes the convolution operator. The identity at the right hand side of eqJT 
can again be evaluated in spatial ftequency domain according to the Convohmon 
Theorem. The Gaussian interpolation scheme, applied to a 0.25 A resolution gnd 
renroduces energies calculated analytically within 3 kcal/mol error Surularly, eq. 7 
S^T^OW-f.nW-'VW). wi* ' explicidy denotmg a Gauss,an 

30 interpolation function and can be evaluated readily 

For calculations using fine grid sizes, evaluanon of eq. 11 becomes 
impracticable. Though the convention approach is faster than poin, by pom, evaluanon 
grid dimensions become ,oo large for a reasonable sized computauonal platform To 
cLmven, this proWem, an intermediate approach is preferred wtach conststs of a rivo 

35 s,ep calculation. Firs, ,he Bgand in,eraction energy with the entire protem ,s scanned a, 
,ower grid resolution typically 0.4-O.SA (referred ,o as global pattern calculate: 
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considering the entire protein molecule) whereafter regions about lowest minima are 
selected for recalculation (see eq. 12 below) on a higher resolution grid typically 0.25 A 
and displacing the ligand's center of mass within a volume of 1 lxl lxl 1 A around each 
minimum (local pattern calculation). Protein electrostatic forcefield contributions on 
finer grids were calculated according to: 



eq. 12 



Since evaluation of eq. 12 is no longer an interpolation, calculations become extremely 
10 accurate at high sampling grid resolution. The Van der Waals non-bonded interaction 

terms are treated similarly. No distance cutoff was applied in either method for potential 

function setup, c.f. eqs 11 and 12. 

The implementation of the forcefield is generic in set up. Although CVFF force 

field was used throughout the calculations, other forcefields can be employed. The 
15 CVFF forcefield allows for intrinsic description of hydrogen bonds. Any explicit 

description of hydrogen bonding such as used in the AMBER force field can be 

implemented by modifying eq. 4 accordingly. The use of different power functions for 

the q, A and B term is also readily incorporated. 

In order to have a forcefield parameterization independent measure of protein- 
20 ligand interactions or 'hits', the probability value p, based on the Boltzmann distribution 

was used where: 

^ eq.13 

rt) - z-ip( ,<ai>, 

^ FV RT 

25 in which R is the molar gas constant and T the absolute temperature. This measure not 
only allows comparison of computational results between protein and ligand for different 
force fields but can also afford a measure of specific affinity by a ligand for a given 
binding site, allowing comparison of affinities among different ligands. 
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EXAMPLE 

The above described method has been implemented using the computer program 
provided in the microfiche appendix, and tested as summarized herein below. All 

o „^tt<.« in AMK5T r.++/C. and has been successfully used on a SGI Indigo 

R4000 and a SGI Power Indigo R8000, both single CPU systems, and a CRAY Y-MP 
C90 used in single CPU mode. Available machine specific FFT routines have been used 
in order attain maximum computational speed. In case of the CRAY C90, the scalar 
version of the FFT routines were used. Where needed an interface to the FFTPACK 
suite of FFT routines can be used. 



MATERIALS and METHOD 

Predictive accuracy of the approach according to the invention was assessed 
using seven different protein-ligand complexes refined to high resolution (1.6 - 2.1 A); 
coordinate files were obtained from the Brookhaven Protein Data Bank (PDB). The 

15 complexes examined were as follows: phospholipase A 2 complexed with a transition- 
state analog (lpoe), ^-trypsin complexed with benzamidine (3ptb), ribonuclease A 
complexed with cytidylic acid (lrob), lysozyme complexed with tri-N-acetylchitotnose 
(lhew) HTV1 protease complexed with U75875 inhibitor (lhiv), dihydrofolate 
reductase complexed with methotrexate (4dfr) and HIV1 protease complexed with 

20 XK263 inhibitor (lhvr). 

Hydrogen atom generation 

Prior to docking calculations hydrogen atom building was done using InsightH 
(TM of Biosym Technologies, San Diego) for all structures; unless indicated otherwise a 

25 pH of 7 was assumed during hydrogen atom generation. Partial charges and CVFF force 
field parameters were assigned using the BUILDER option in InsightH. Each ligand was 
energy rninimized for 100 cycles using DISCOVER (TM of Biosym Technologies, San 
Diego). Protein coordinates were not allowed to vary during refinement. Except for 
lysozyme, protein-ligand interaction energies did not show improvement from their 

30 starting values. For lysozyme, a significant reduction in interaction energy of 54.2 
kcal/mol was obtained and corresponded to an RMS shift of 1.16A by the ligand from 
its crystaUographic determined coordinates. The translational shift in centers of mass 
between observed and calculated ligand positions was however only 0.04 A suggesting 
imprecise ligand positioning, a consequence of unrelieved close hydrogen atom contacts. 

35 During all subsequent calculations, original crystaUographic coordinates were used 
except for lysozyme where the refined coordinates were chosen. 
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Algorithm Comparison 

In order to compare the present Fourier method with other methods, two 
different computational algorithms were implemented which yielded identical interaction 
energy values. The first involved a straightforward atom-to-atom energy calculation, 
5 based on charge-charge and Van der Waals energy contributions. This implementation, 
in which the computational load is dependent on the number of atoms in both protein 
and ligand, was optimized as to minimize the number of multiplications. The second 
algorithm was based on an atom-to-grid method, were the protein potential functions for 
electrostatic and Lennard- Jones contributions were mapped onto a 0.25 A resolution 

10 sampling grid prior to energy calculations. In this implementation, the computational 
load is dependent only on the number of ligand atoms and was also optimized as to 
minimize the number of multiplications, ligand atomic positions were subjected to the 
Gaussian interpolation scheme described above. For both algorithms, and similar to the 
fast Fourier method, no distance cutoff was applied during calculations. Comparisons 

15 represented identical number of configurations for protein-ligand complexes of 
HIV/U75875, LYS/NAG, FOLATE/MTX and TRYP/BEN. 

RESULTS 

Docking site determination 

20 In order to minimize bias, ligand starting positions were randomized prior to 

calculation in orientation and translation with respect to the target protein. High 
computational memory demands required that each ligand-protein complex be analyzed 
using the two-step evaluation procedure described above. All possible protein-ligand 
binding sites were first explored at 0.5 A grid size resolution from which lowest minima 

25 were extracted for local exploration at 0.25 A grid size resolution. Shown in Table la is 
a comparison between the crystallographic determined structures and nearest 
corresponding minima for complexes which could be successfully determined using a 
randomized starting orientation and translation of the ligand. Also indicated are Euler 
sampling and relative ranking of the minima used in comparison with the 

30 crystallographic determined structure. Repeated calculations did not show differences as 
to the ranking of the minima in the cases of HIV/U75875, LYS/NAG and TRYP/BEN. 
In the case of the RNASE/CYT complex repeated calculations did show differences in 
peak rank of the nearest corresponding minima and are also shown in Table la. As 
discussed below these multiple calculations, using different randomized starting 

35 conditions, were used to further examine energetically similar conformations and 
enabled unique identification of one best ranking peak based on the rescaling property of 
Boltzmann statistics. 
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Calculations for three complexes in which the ten lowest minima failed to predict 
ligand's center of mass to within one bond length of the ligand's crystallographic center 
of mass were repeated without randomization of the ligand's starting position. In all 
three cases, successful predictions were obtained coinciding to within one bond length of 
respective centers of mass of the crystallographically determined structures. Results are 
shown in Table lb. 

Table la: Protein-ligand docking of four crystallographically determined complexes 
using a 0.25 A grid size interval, applying Euler sampling with translation^ 



15 



protein-ligand 
complex 



20 



Euler sampling 
interval^) 



HTV7U75875 20 
LYS/NAG 30 
TRYP/BEN 45 



r 1 


d(A) 2 


e o 3 


1 


1.11 


8.0 


1 


0.47 


6.8 


1 


0.55 


14.8 


1C0 5 


0.90 


21.4 6 


1(2) 5 


0.98 


23.8 


2 (3) 5 


0.91 


21.9 


4(4) 5 


0.75 


19.4 



RNASE/CYT 4 12 

RNASE/CYT 12 

RNASE/CYT 12 

RNASE/CYT 12 



Table lb: Protein-ligand docking of three crystaUographically determined complexes 
using a 0.25 A grid size interval without translational and rotational 
randomization of ligand's st arting position. 




1 r defines the rank of the interaction energy relative to the lowest calculated global 
minimum and corresponding to the protein ligand complex whose center of mass 
calculated closest to that of the crystallographically observed complex. 

2 translational shift 6 was defined as the distance between the centers of mass of 
predicted and observed protein-ligand complexes. 
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3 rotational shift e was obtained from the trace of the rotation matrix of the 
transformation operation relating calculated ligand position and observed ligand 
position in the protein-ligand complex: Tr[R] = 2cos 9 + 1, with 9 the rotation angle. 

4 in the case of RNASE/CYT complexes, the most probable docking site was selected 
5 based on Boltzmann statistics from the calculations listed (see also text). 

5 shown in parenthesis are the relative ranking in interaction energy of the 
RNASE/CYT complexes. 

6 rotational shift of the predicted RNASE/CYT complex is larger than the actual Euler 
sampling interval and is a consequence of mapping atoms to a finite grid. 

10 7 shown in parenthesis are translational shifts obtained from comparative calculations 
using only translational randomization of ligands' starting positions. 
8 zero values are in fact expected as no rotational randomization was applied and 
indicated that no false minima were obtained from the set of tested configurations. 

1 S Resolution dependency 

Protein ligand complexes shown in Table la which were predicted using 
randomized ligand's starting positions were further assessed as a function of grid size 
resolution dependency. Calculations were performed at 2.0 A, 1.0 A, 0.7 A, 0.5 A, 0.4 
A, and 0.25 A grid size resolution, respectively (Table 2). Due to memory limitations 
20 only local pattern calculations were performed around the crystallographic observed 
structures for resolutions of 0.4 A and 0.25 A Repeated calculations using different 
randomized starting conditions for the ligands in the above cases did not show any 
significant changes in the results shown in Table 2. 



25 



WO 98/47088 



PCT/CA98/00327 



- 18- 

Table 2: Resolution dependency of protein-ligand complex prediction. 











global 








local 




2.0 




1.0 




0.7 




0.5 




0.4 




0.25 


protein-ligand 


5 3 


r 2 


5 


r 


5 


r 


8 


r 


5 


r 


5 r 


complex 1 
























RNASE/CYT 


15. 


64 


7.3 


63 


4.9 


2 


1.8 


20 


1.7 


2 


1.0 1 


HTV7U75875 


7 
21. 


16 


17. 


85 


12. 


67 


8.0 


67 


3.6 


9 


1.1 1 




1 




8 




2 












0.5 1 


LYS/NAG 


21. 


87 


16. 


57 


5.7 


47 


3.0 


25 


0.4 


2 




0 




0 
















0.5 1 


TRYP/BEN 


24. 


85 


14. 


75 


7.3 


59 


6.0 


8 


0.9 


8 




1 




4 



















observed ligand structure. 

2 rank of interaction energy minimum defined as in table 1 . 

3 translation^ shift 8 defined as in table 1 



Dynamical docking calculations 

10 In order to test the methodology using a dynamical description of ligand, the 

TRYP/BEN complex was examined. A series of frames were generated corresponding 
to an in vacuo dynamical simulation of the ligand at 300° K for 1.2 ps at 1.0 fs step 
intervals and frames sampled every 50 steps. The resulting 23 frames, one of which 
corresponded to the crystallographic solution, were used as input for interaction energy 

1 5 calculation. Frames depict essentially a 1 80° rotation of the amidino group with respect 
to the phenyl moiety of benzamidine. Each frame was evaluated using a Euler sampling 
interval of 45° on a grid of 0.25 A resolution. The resulting 5 lowest minima are shown 
in table 3. Column 7 in table 3 shows the minima ranking after correction for the 
difference in ligand internal energy (column 5) between that of the in vacuo simulation 

20 and that calculated for the crystallographic structure, differences in ligand internal 
energy being a consequence of conformational strain introduced by the simulation. 
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Table 3 : Docking of Trypsin-benzamide complex using a dynamical 
description 



ligand 



initial 


d(A) 


en 


Eenthalpy 


AEcorr 


Etot 


final 


ranking 






kcal/mol 


kcal/mol 


kcal/mol 


ranking 


1 


1.68 


172 


-40.09 


16.01 


-24.08 


10 


2 


2.04 


139 


-36.12 


8.29 


-27.83 


8 


3 


0.25 


2 


-35.90 


10.53 


-25.37 


9 


4 


0.354 


0 


-35.66 


0.0 


-35.66 


1 


5 


0.354 


0 


-35.60 


0.0 


-35.60 


2 



1 based on 23 frames sampled over 1.2 ps. 

2 ligand internal energy difference corresponding to conformational strain calculated 
relative to the crystallographic ligand conformation. 



10 



Computational load 

Computations were performed on three different platforms. Table 4 summarizes 
CPU times utilized for each of the 10 non-similar calculations listed in Tables la, lb and 
3 The times listed are without the overhead calculation time needed for the potential 
function setup. 



Table 4 : CPU time for local pattern calculations using 0.25 A grid resolution 



complex 


grid size 


no. of 
orientations 


platform 


CPU-time 


FOLATE/MTX 


118x118x118 


612 


R8000 


16h7 


HTV7U75875 


146x145x145 


612 


R8000 


27h46 


LYS/NAG 


132x132x132 


112 


C90 


lh42 


LYS/NAG 


132x132x132 


264 


R8000 


4hl 


PLA2/GEL 


112x111x111 


112 


R4000 


3h30 


RNASE/CYT 


84x84x84 


1740 


R8000 


9h27 


TRYP/BEN 


80x80x80 


24 


C90 


5m9 


TRYP/BEN 


80x80x80 


112 


R4000 


lh9 


TRYP/BEN(dyn) 


80x80x80 


2352 


R4000 


10h8 


HTWXK263 


142x141x141 


112 


R8000 


4h54 



15 To assess gain in computational speed, CPU times of the present fast Fourier 

transform method were compared with results of identical computations using atom-to- 
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atom and atom-to-grid algorithms described above. For four protein-ligand complexes, 
of different computational complexity in terms of the sampling grid size used, the last 
column in Table 5 lists the CPU computation times corresponding to each algorithm. 

_ • > .1. :*u - ir\o cUr comnKnrr Interval 1 miner a 0 25 A 

Computations involved caicuiauous wmi a j\> .... a 

resolution sampling grid placed within a llxllxll A box and centered around the 
respective ligands' crystallographic positions. Also indicated are sampling grid size and 
number of atoms. In cases of atom-to-atom and atom-to-grid computations, CPU times 
listed were obtained by extrapolation based on sample calculations corresponding to one 
hundred ligand conformations. 



DISCUSSION OF EXAMPLE 
Docking site determination 

The present method is able to reproduce the crystallographic observed liganded 
complexes with a high rate of success. A result was considered successful when the 
15 translations difference between calculated and crystallographically observed ligand's 
center of mass was less than one carbon-carbon bond length. Except for two 
calculations in case of the RNASE/CYT complex discussed below, the lowest energy 
minimum of each protein-ligand complex, shown in Table la, coincided with the 
crystallographic structure to within less than one bond length. Local pattern calculations 
20 for the RNASE/CYT complex (Table la) showed, contrary to other interaction energy 
calculations, minima with only slight differences in binding energy (< 6 kcal/mol). Two 
of the predicted centers of mass were decidedly different from crystallographic observed 
complexes suggesting that predictive accuracy may be dependent on starting position 
randomization. The lowest interaction energy for the RNASE/CYT complex 
25 calculations corresponded nevertheless to the predicted center of mass having the least 
difference, 0.90A, with respect to that of the crystallographic complex. Center of mass 
of the second minimum differed by 0.98 A from the crystallographic structure while the 
two latter minima predicted a center of mass more than 2 A from the crystallographic 
complex. Energy differences between the lowest interaction energy minimum and 
30 remaining minima corresponded to 2.2 kcal/mol, 3.0 kcal/mol and 5.4 kcal/mol while 
Boltzmann probabilities for each minimum, calculated according to eq. 13 , were 0.92, 
0.54, 0.39 and 0.29 respectively and are insufficient for order of magnitude minima 
discrimination typically observed in the other protein-ligand complexes. To enhance 
global minimum discrimination, interaction energies of the four calculations were 
35 appropriately summed and Boltzmann probabilities recalculated. Minima ranking 
following recalculation yielded a probability of 0.89 for the lowest minimum and which 
corresponded to the previous global minimum. Boltzmann probability for the first 
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predicted protein-iigand complex not coincident with the crystallographic observed 
structure and now ranking third was 0.005. Summation of repeated interaction energy 
calculations thus afforded improved discrimination of the predicted crystallographic 
complex by nearly two orders of magnitude. In general, choice of the true docking site 
5 amongst alternate sites is largely dependent on the energy difference among docking 
sites. Cumulative Boltzmann probabilities from repeated calculations with random 
starting positions can be used to accurately discriminate the true docking site in cases 
where it can not be distinguished solely from energy differences. 

In three of the seven selected complexes, docking calculations apparently failed to 

10 predict the crystallographic solution when a full randomized starting positions were 
employed; without translational and rotational randomization, however, the 
crystallographic solutions were unequivocally obtained and corresponded to the 
expected Euler sampling angle of zero degrees (Table lb). Translational randomization 
at 0.25 A resolution also reproduced the crystallographic solutions (data not shown). 

15 The seeming failure to predict the crystallographic solution from a full randomized 
starting position is apparent if one considers the 'tightness of fit' of these ligands at their 
target site. In order to reproduce the crystallographic solution using randomized starting 
coordinates, exploration using a 1-2° Euler sampling interval would be required. The 
FOLATE/MTX and PLA2/GEL complexes show similar tight fitting energy profiles 

20 (not shown) requiring a Euler sampling interval in the order of 4-5° for adequate 
minimum exploration. Currently, the Euler sampling interval cannot be less than 12° due 
to computational limitations. The fact that the non-randomized calculations reproduced 
the crystallographic solution unequivocally suggests that for a Euler sampling interval of 
less than 12°, a successful solution should be obtained using randomized starting 

25 position. Increasing the Euler sampling interval should also improve agreement between 
predicted and crystallographically observed protein-iigand complexes in aforementioned 
calculations that employed randomized starting coordinates (Table la). 

Resolution dependency 

30 From table 2, it is evident that calculations at fine grid size resolution are essential 

for obtaining agreement with the crystallographic observed complexes. At low grid size 
resolution (> 0.7 A) the sampled energy surface was essentially flat and no definitive 
energy minima could be discerned. At higher resolutions (0.7 A - 0.4 A) the predicted 
centers of mass approached the observed structure; the grid, however, was sampled still 

35 too coarse as the calculated energy corresponding to the crystallographic observed 
complex was still too high (i.e. ranking relative to lowest minima). At high resolution 
(0.25 A) differences between centers of mass were least and lowest energy minima for 
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docked ligands reflected the crystallographic observed structures. Conversely, low 
resolution calculations without randomization of the ligand's starting conditions, 
equivalent to extremely fine Euler sampling, show the same 'flat' energy surfaces as with 

ctirvivnt Thftse results sueeest that 
randomized starung positions ^aiwiio«u. w - — • — 

5 putative binding sites can only be predicted using high resolution calculations, < 0.4 A 
grid size. A previous report (Vakser LA., Protein docking for low-resolution structures, 
Protein Eng. 8(4):37 1-377, 1995) using 7.0 A grid size resolution and employing a 
shape complementary approach obtained statistically reasonable results in centers of 
mass prediction. It should be noted, however, that for this method to be generally 

10 successful the shape and size of the ligand must show significant features resolvable at 
7.0 A resolution. 



Dynamical docking calculations 

Results of the TRYP/BEN complex using a dynamical description for the 
15 benzamidine molecule (Table 3) shows the crystallographically observed structure 
among the highest ranking solutions. Allowance for ligand conformational strain 
introduced by the dynamical simulation modifies the rank order of the solutions such 
that the crystallographic structure corresponds to the two highest ranking solutions and 
the three previous lowest minima are now ranked 8th, 9th and 10th, respectively. The 
20 top ranking structures show the phenyl ring of benzamide in approximately identical 
orientations, underscoring the importance of the hydrophobic interaction between the 
phenyl moiety of benzamidine and trypsin. 

Computational load 

25 Evaluation of the interaction energies corresponding to the large numbers of 

ligand conformations in a protein-ligand complex, the present method represents a 
considerable gain in computational speed. With respect to the direct sum computation of 
correlations as expressed at the right hand side in eq. 8., considerable improvement m 
computational speed was gained using the FFT approach, which is O^Nlog 2 N; rather 

30 than O^NxN; in terms of the number of operations performed where N represents the 
number of grid elements. Computations for each of the cases, shown in Table 5, 
corresponded to interaction energies calculated for 22,488,576 different configurations, 
and based on the CPU times required for each computation, it is evident that the present 
method is superior with respect to atom-to-atom and atom-to-grid computations. 



35 
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Table 5: CPU times for different computational algorithms 



protein-ligand 
complex 



evaluation method 



grid size 



computational time 



HIV/U75875 



atom-to-atom 

atom-to-grid 

FFT 4 



^=1561 5 , Afr=121 6 
Ny=l2\, 44x44x44 
146x145x145 



179.6 days 
47.4 days 
27.7 hours 



LYS/NAG 



atOm-to-atom 

atom-to-grid 

FFT 



^=1960, JVs=84 
Ns=&4, 44x44x44 
132x132x132 



212.8 days 
33.0 days 
4.0 hours 



FOLATE/MTX 



atom-to-atom 

atom-to-grid 

FFT 



N p =244\, Ns=57 
Ns=57, 44x44x44 
118x118x118 



179.8 days 
22.3 days 
6.9 hours 



10 



atom-to-atom A/p=3219, Afe=17 71.3 days 

TRYP/BEN atom-to-grid ^=17,44x44x44 6.6 days 

80x80x80 1.2 hours 



FFT 



1 Interaction energies were calculated at 0.25 A resolution, within a box of llxllxll 
A 3 about the substrate's crystallographic position, using 30° Euler sampling 
(22 488 576 ligand conformations) on a single CPU SGI R8000 platform. 

2 Atom-to-atom interaction energy evaluation; indicated CPU time values are 
extrapolated from a sample of 100 calculated conformations. Algorithm optimized for 
multiplicative efficiency. 

3 Protein potential function was sampled on 0.25 D grid prior to energy evaluation; 
indicated CPU times exclude potential function setup and are extrapolated from a 
sample of 100 conformations calculated. Ligand atomic positions were interpolated 
by Gaussian interpolation/Algorithm optimized for multiplicative efficiency. 

4 Proposed method. 

5 Np represents number of atoms in protein molecule. 

6 N s represents number of atoms in ligand molecule. 
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The method according to the invention is robust in nature. In the above example, 
of seven protein-ligand complexes, the crystallographic solution was successfully 
reproduced. In four of the seven studies, good agreement could be obtained using an 
5 unbiased approach. Due to the relatively tight fit of ligand in its binding site in the 
remaining three cases, the crystallographic observed complex failed to be reproduced 
from randomized starting position of the ligand, whereas no randomization yielded 
success. This apparent failure suggest that accurate prediction using randomized starting 
positions can be obtained with a finer Eulerian sampling interval than the current 

10 minimum of 12°. 

Boltzmann probabilities represent an useful tool to discriminate against artefactual 
docking sites. In the case of RNASE/CYT complex, four calculations at 0.25 A grid size 
resolution showed that two of four highest ranking peaks did not predict the 
crystallographic observed complex, although the highest ranking peaks for each 

15 calculation had energies that were similar to within 6 kcal/mol. Combined Boltzmann 
probabilities clearly identified the lowest energy minimum as corresponding to the 
crystallographic observed complex. Boltzmann probabilities calculated by summation of 
interaction energies represents a useful tool to improve certitude of predicted docking 
sites. 

20 Remarkably, without any explicit parameterization to take solvation into account, 

all seven test cases yielded unambiguously the crystallographic solution. The high 
success in reproducing all crystallographic observed structures suggests enthalpy of 
formation is decisive in detennining observed complex stability in case of the selected 
protein-ligand structures. 

25 The analysis clearly indicates that for predictive accuracy quantitative calculations 

should be performed at least at 0.4 A grid size resolution. Satisfactory results are 
obtained at 0.25 A resolution. The two-step computational procedure used in our 
approach: calculation at 0.4 A resolution, selection of lowest minima, followed by a 
series of local calculations about lowest minima at 0.25 A resolution is in principle not 

30 needed. A single calculation at 0.25 A resolution is preferred. Computational resources 
may impose, however, a two-step procedure. 

A dynamical description of protein-ligand complex formation has been 
demonstrated for the TRYP/BEN complex. Although only a dynamical description for 
ligand was used, a combined protein ligand dynamical description would be equivalent 

35 to successive calculations of the protein/ligand complex each corresponding to a 
different conformational state of the protein. The additive nature of the potential 
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functions allows a judicious approach in potential function setup thus saving overall 
computation time. 

The present invention may be used in predicting putative ligand binding sites. 
Although the method according to the invention is superior, considering computational 

5 load, compared to similar methods of calculating protein-ligand interaction enthalpies, a 
considerable improvement is expected when the program is ported to a vector/parallel 
computational platform. The method and especially the FFT routine which is at the heart 
of the method is inherently suitable for vectorization and parallel computing. In addition, 
calculation of interaction energies as a function of Eulerian angles (Oj, 02, 03) or 

10 conformation (t s ) can be performed in parallel. With the imminent availability of larger 
computer memories, combined with computer vectorization and parallehsm at desk-top 
level, the methodology should become suitable even for calculations on large 
protein/ligand complexes including protein-protein interactions. 
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CLAIMS: 



1. A method for generating a position value for an energetically favorable binding 
site between two molecules, comprising the steps of: 
5 a) obtaining potential energy structural data for each atom site in said molecules; 

b) selecting a grid resolution corresponding to a sampling grid size substantially 
smaller than an average distance between bonded atoms in said molecules; 

c) selecting a range of relative rotations between said two molecules; 

d) mapping a plurality of potential energy field components of one said molecules 
10 onto a corresponding one of a plurality of energy field component grids having said 

resolution with said one molecule at a predetermined rotation and position, wherein each 
grid point of said component grids has a potential energy value interpolated from said 
potential energy structural data; 

e) mapping a plurality of interaction field components of another of said molecules 
15 onto a corresponding one of a plurality of interaction component grids having said 

resolution with said other molecule at a predetermined rotation and position, said 
interaction component corresponding to coefficients of a forcefield between said 
molecules, wherein each grid point of said component grids has an interaction value 
interpolated from said potential energy structural data; 
20 f) calculating a correlation between each said potential energy field component grid 
and each said interaction field grid to obtain a grid of molecule binding energy values 
representing a binding energy of said two molecules in said relative rotation for relative 
translational positions in space between said molecules; 

g) determining at least one maximum of said binding energy values and recording 
25 said relative translational positions for said maximum binding energy values; 

h) rotating at least one of said molecules according to each said relative rotation in 
said range, repeating said step of mapping for said at least one of said molecules and 
subsequently repeating said steps (£) and (g) of calculating and detentuning for each said 
relative rotation; and 

30 i) selecting an energetically favorable one of said relative rotations in said range 
and said relative translational positions based on said maximum binding energy values to 
generate said position value for an energetically favorable binding site between said two 
molecules. 

35 2. The method according to claim 1, wherein said at least one of said molecules 
rotated is said other molecule only, whereby said interaction field component grid is 
mapped for each rotation of said other molecule and computation time is reduced. 
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3. The method according to claim 1, wherein said step of calculating a correlation 
comprises calculating a discrete Fourier transform of said component grids, calculating a 
product of said transformed grids, and using an inverse Fourier transform to obtain said 

5 grid of molecule binding energy values. 

4. The method according to claim 2, wherein said step of calculating a correlation 
comprises calculating a discrete Fourier transform of said component grids, calculating a 
product of said transformed grids, and using an inverse Fourier transform to obtain said 

1 0 grid of molecule binding energy values. 

5. The method according to claim 1, wherein said step of selecting said 
energetically favorable one of said relative rotations and translational positions 
comprises selecting a plurality of most energetically favorable ones of said maximum 

15 binding energy values, and calculating Boltzmann probabilities of occupying energy 
states corresponding to said plurality of most energetically favorable ones of said 
maximum binding energy values, said energetically favorable one of said relative 
rotations and translational positions selected corresponding to a rotation and 
translational position of one of said plurality of most energetically favorable ones of said 

20 maximum binding energy values having a highest one of said Boltzmann probabilities 
calculated. 

6. The method according to claim 2, wherein said step of selecting said 
energetically favorable one of said relative rotations and translational positions 

25 comprises selecting a plurality of most energetically favorable ones of said maximum 
binding energy values, and calculating Boltzmann probabilities of occupying energy 
states corresponding to said plurality of most energetically favorable ones of said 
maximum binding energy values, said energetically favorable one of said relative 
rotations and translational positions selected corresponding to a rotation and 

30 translational position of one of said plurality of most energetically favorable ones of said 
maximum binding energy values having a highest one of said Boltzmann probabilities 
calculated. 

7. The method according to claim 3, wherein said step of selecting said 
35 energetically favorable one of said relative rotations and translational positions 

comprises selecting a plurality of most energetically favorable ones of said maximum 
binding energy values, and calculating Boltzmann probabilities of occupying energy 
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states corresponding to said plurality of most energetically favorable ones of said 
maximum binding energy values, said energetically favorable one of said relative 
rotations and translation^ positions selected corresponding to a rotation and 

rallv favorable ones of said 

traXlSldUUIlill pusmuu ui uus ui oiuu piuiunvj ~~ w- 0 j — 

5 maximum binding energy values having a highest one of said Boltzmann probabilities 
calculated. 

8. The method according to claim 4, wherein said step of selecting said 
energetically favorable one of said relative rotations and translation^ positions 

10 comprises selecting a plurality of most energetically favorable ones of said maximum 
binding energy values, and calculating Boltzmann probabilities of occupying energy 
states corresponding to said plurality of most energetically favorable ones of said 
maximum binding energy values, said energetically favorable one of said relative 
rotations and translation^ positions selected corresponding to a rotation and 

1 5 translations position of one of said plurality of most energetically favorable ones of said 
maximum binding energy values having a highest one of said Boltzmann probabilities 
calculated. 

9. The method according to claim 1 , wherein said grid resolution selected is a lower 
20 grid resolution, and said position value generated is a coarse position value, further 

comprising steps of: 

selecting a fine grid resolution higher than said lower grid resolution; 

selecting a fine range of relative rotations between said two molecules proximate 
to a rotation value of said position value; 
25 repeating said steps (d) through (i) for said fine grid resolution and for said fine 

range to generate said position value, whereby computation speed may be increased by 
using a lower grid resolution to obtain said coarse position value, and grid mapping at 
said fine resolution may be done using smaller grid dimensions while obtaining a more 
accurate final position value. 

30 

10. The method according to claim 2, wherein said grid resolution selected is a lower 
grid resolution, and said position value generated is a coarse position value, further 

comprising steps of: 

selecting a fine grid resolution higher than said lower grid resolution; 
35 selecting a fine range of relative rotations between said two molecules proximate 

to a rotation value of said position value; 
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repeating said steps (d) through (i) for said fine grid resolution and for said fine 
range to generate said position value, whereby computation speed may be increased by 
using a lower grid resolution to obtain said coarse position value, and grid mapping at 
said fine resolution may be done using smaller grid dimensions while obtaining a more 
5 accurate final position value. 

11. The method according to claim 3, wherein said grid resolution selected is a lower 
grid resolution, and said position value generated is a coarse position value, further 
comprising steps of: 

10 selecting a fine grid resolution higher than said lower grid resolution; 

selecting a fine range of relative rotations between said two molecules proximate 
to a rotation value of said position value; 

repeating said steps (d) through (i) for said fine grid resolution and for said fine 
range to generate said position value, whereby computation speed may be increased by 
15 using a lower grid resolution to obtain said coarse position value, and grid mapping at 
said fine resolution may be done using smaller grid dimensions while obtaining a more 
accurate final position value. 

12. The method according to claim 4, wherein said grid resolution selected is a lower 
20 grid resolution, and said position value generated is a coarse position value, further 

comprising steps of: 

selecting a fine grid resolution higher than said lower grid resolution; 

selecting a fine range of relative rotations between said two molecules proximate 
to a rotation value of said position value; 
25 repeating said steps (d) through (i) for said fine grid resolution and for said fine 

range to generate said position value, whereby computation speed may be increased by 
using a lower grid resolution to obtain said coarse position value, and grid mapping at 
said fine resolution may be done using smaller grid dimensions while obtaining a more 
accurate final position value. 

30 

13. The method according to claim 5, wherein said grid resolution selected is a lower 
grid resolution, and said plurality of most energetically favorable ones of said maximum 
binding energy values are coarse values, further comprising steps of: 

selecting a fine grid resolution higher than said lower grid resolution; 
35 selecting a plurality of fine ranges of relative rotations between said two 

molecules proximate to a rotation value corresponding to each of said plurality of most 
energetically favorable ones of said maximum binding energy values; and 
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repeating said steps (d) through (i) for said fine grid resolution and for each of 
said fine ranges to obtain a fine plurality of most energetically favorable ones of said 
maximum binding energy values for calculating said Boltzmann probabilities, whereby 
computation speea may oe increascu uy usuig a iuwa gim ^oumuu-. ^ — — 
5 coarse position value, and grid mapping at said fine resolution may be done using 
smaller grid dimensions while obtaining a more accurate final position value. 

14. The method according to claim 6, wherein said grid resolution selected is a lower 
grid resolution, and said plurality of most energetically favorable ones of said maximum 

10 binding energy values are coarse values, further comprising steps of: 

selecting a fine grid resolution higher than said lower grid resolution; 
selecting a plurality of fine ranges of relative rotations between said two 
molecules proximate to a rotation value corresponding to each of said plurality of most 
energetically favorable ones of said maximum binding energy values; and 

15 repeating said steps (d) through (i) for said fine grid resolution and for each of 

said fine ranges to obtain a fine plurality of most energetically favorable ones of said 
maximum binding energy values for calculating said Boltzmann probabilities, whereby 
computation speed may be increased by using a lower grid resolution to obtain said 
coarse position value, and grid mapping at said fine resolution may be done using 

20 smaller grid dimensions while obtaining a more accurate final position value. 

15. The method according to claim 7, wherein said grid resolution selected is a lower 
grid resolution, and said plurality of most energetically favorable ones of said maximum 
binding energy values are coarse values, further comprising steps of: 

25 selecting a fine grid resolution higher than said lower grid resolution; 

selecting a plurality of fine ranges of relative rotations between said two 
molecules proximate to a rotation value corresponding to each of said plurality of most 
energetically favorable ones of said maximum binding energy values; and 

repeating said steps (d) through (i) for said fine grid resolution and for each of 

30 said fine ranges to obtain a fine plurality of most energetically favorable ones of said 
maximum binding energy values for calculating said Boltzmann probabilities, whereby 
computation speed may be increased by using a lower grid resolution to obtain said 
coarse position value, and grid mapping at said fine resolution may be done using 
smaller grid dimensions while obtaining a more accurate final position value. 

35 
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16. The method according to claim 8, wherein said grid resolution selected is a lower 
grid resolution, and said plurality of most energetically favorable ones of said maximum 
binding energy values are coarse values, further comprising steps of: 

selecting a fine grid resolution higher than said lower grid resolution; 

5 selecting a plurality of fine ranges of relative rotations between said two 

molecules proximate to a rotation value corresponding to each of said plurality of most 
energetically favorable ones of said maximum binding energy values; and 

repeating said steps (d) through (i) for said fine grid resolution and for each of 
said fine ranges to obtain a fine plurality of most energetically favorable ones of said 

10 maximum binding energy values for calculating said Boltzmann probabilities, whereby 
computation speed may be increased by using a lower grid resolution to obtain said 
coarse position value, and grid mapping at said fine resolution may be done using 
smaller grid dimensions while obtaining a more accurate final position value. 

15 17. The method according to claim 1, wherein said energy field components 
comprise a first electrostatic component and Van der Waals components. 

18. The method according to claim 17, wherein said Van der Waals components 
comprise a second 10" 6 component and a third 10" 12 component. 

20 

19. The method according to claim 1, wherein said grid resolution provides a 
sampling grid size less than 0.45A. 

20. The method according to claim 1, wherein said grid resolution is different for 
25 different ones of said field components, whereby those field components having a 

function which is more sensitive to position can be calculated at a higher grid resolution. 
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AMENDED CLAIMS 
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16. The method according to claim 8, wherein said grid resolution selected is a 
lower grid resolution, and said plurality of most energetically favorable ones of said 
maximum binding energy values are coarse values, further comprising steps of: 

selecting a fine grid resolution higher than said lower grid resolution; 
5 selecting a plurality of fine ranges of relative rotations between said two 

molecules proximate to a rotation value corresponding to each of said plurality of most 
energetically favorable ones of said maximum binding energy values; and 

repeating said steps (d) through (i) for said fine grid resolution and for each of 
said fine ranges to obtain a fine plurality of most energetically favorable ones of said 
10 maximum binding energy values for calculating said Boltzmann probabilities, whereby 
computation speed may be increased by using a lower grid resolution to obtain said 
coarse position value, and grid mapping at said fine resolution may be done using 
smaller grid dimensions while obtaining a more accurate final position value. 

15 17. The method according to claim 1, wherein said energy field components 
comprise a first electrostatic component and Van der Waals components. 

18. The method according to claim 17, wherein said Van der Waals components 
comprise a second 1(H> component and a third 10" ^2 component. 

20 

19. The method according to claim 1, wherein said grid resolution is different for 
different ones of said field components, whereby those field components having a 
function which is more sensitive to position can be calculated at a higher grid 
resolution. 

25 

20. The method according to one of claims 1 to 19, wherein said grid resolution 
provides a sampling grid size less than 0.45 A. 

21. The method according to claim 20, wherein said grid resolution is 
30 approximately 0.25 A. 

22. The method according to claim 1,2,5,6,9,10,13,14,17,18,19,20 or 21, wherein 
said step (f) comprises calculating a transform of each said potential energy field 
component grid and of each said interaction field component grid, calculating a 

35 function of said transformed grids, and using an inverse transform on a result of said 
function to obtain said correlation. 
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